Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Bioinform Adv ; 2(1): vbac032, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35669345

RESUMO

Motivation: Splice variant neoantigens are a potential source of tumor-specific antigen (TSA) that are shared between patients in a variety of cancers, including acute myeloid leukemia. Current tools for genomic prediction of splice variant neoantigens demonstrate promise. However, many tools have not been well validated with simulated and/or wet lab approaches, with no studies published that have presented a targeted immunopeptidome mass spectrometry approach designed specifically for identification of predicted splice variant neoantigens. Results: In this study, we describe NeoSplice, a novel computational method for splice variant neoantigen prediction based on (i) prediction of tumor-specific k-mers from RNA-seq data, (ii) alignment of differentially expressed k-mers to the splice graph and (iii) inference of the variant transcript with MHC binding prediction. NeoSplice demonstrates high sensitivity and precision (>80% on average across all splice variant classes) through in silico simulated RNA-seq data. Through mass spectrometry analysis of the immunopeptidome of the K562.A2 cell line compared against a synthetic peptide reference of predicted splice variant neoantigens, we validated 4 of 37 predicted antigens corresponding to 3 of 17 unique splice junctions. Lastly, we provide a comparison of NeoSplice against other splice variant prediction tools described in the literature. NeoSplice provides a well-validated platform for prediction of TSA vaccine targets for future cancer antigen vaccine studies to evaluate the clinical efficacy of splice variant neoantigens. Availability and implementation: https://github.com/Benjamin-Vincent-Lab/NeoSplice. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

2.
Cell Stem Cell ; 25(1): 149-164.e9, 2019 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-31230860

RESUMO

Direct cellular reprogramming provides a powerful platform to study cell plasticity and dissect mechanisms underlying cell fate determination. Here, we report a single-cell transcriptomic study of human cardiac (hiCM) reprogramming that utilizes an analysis pipeline incorporating current data normalization methods, multiple trajectory prediction algorithms, and a cell fate index calculation we developed to measure reprogramming progression. These analyses revealed hiCM reprogramming-specific features and a decision point at which cells either embark on reprogramming or regress toward their original fibroblast state. In combination with functional screening, we found that immune-response-associated DNA methylation is required for hiCM induction and validated several downstream targets of reprogramming factors as necessary for productive hiCM reprograming. Collectively, this single-cell transcriptomics study provides detailed datasets that reveal molecular features underlying hiCM determination and rigorous analytical pipelines for predicting cell fate conversion.


Assuntos
Fibroblastos/fisiologia , Miócitos Cardíacos/fisiologia , Análise de Célula Única/métodos , Animais , Diferenciação Celular , Linhagem da Célula , Reprogramação Celular , Técnicas de Reprogramação Celular , Humanos , Análise de Sequência de RNA , Transcriptoma
3.
Nature ; 551(7678): 100-104, 2017 11 02.
Artigo em Inglês | MEDLINE | ID: mdl-29072293

RESUMO

Direct lineage conversion offers a new strategy for tissue regeneration and disease modelling. Despite recent success in directly reprogramming fibroblasts into various cell types, the precise changes that occur as fibroblasts progressively convert to the target cell fates remain unclear. The inherent heterogeneity and asynchronous nature of the reprogramming process renders it difficult to study this process using bulk genomic techniques. Here we used single-cell RNA sequencing to overcome this limitation and analysed global transcriptome changes at early stages during the reprogramming of mouse fibroblasts into induced cardiomyocytes (iCMs). Using unsupervised dimensionality reduction and clustering algorithms, we identified molecularly distinct subpopulations of cells during reprogramming. We also constructed routes of iCM formation, and delineated the relationship between cell proliferation and iCM induction. Further analysis of global gene expression changes during reprogramming revealed unexpected downregulation of factors involved in mRNA processing and splicing. Detailed functional analysis of the top candidate splicing factor, Ptbp1, revealed that it is a critical barrier for the acquisition of cardiomyocyte-specific splicing patterns in fibroblasts. Concomitantly, Ptbp1 depletion promoted cardiac transcriptome acquisition and increased iCM reprogramming efficiency. Additional quantitative analysis of our dataset revealed a strong correlation between the expression of each reprogramming factor and the progress of individual cells through the reprogramming process, and led to the discovery of new surface markers for the enrichment of iCMs. In summary, our single-cell transcriptomics approaches enabled us to reconstruct the reprogramming trajectory and to uncover intermediate cell populations, gene pathways and regulators involved in iCM induction.


Assuntos
Reprogramação Celular/genética , Fibroblastos/citologia , Fibroblastos/metabolismo , Miócitos Cardíacos/citologia , Miócitos Cardíacos/metabolismo , Análise de Célula Única , Transcriptoma , Algoritmos , Animais , Linhagem da Célula/genética , Regulação para Baixo/genética , Fator de Transcrição GATA4/genética , Ribonucleoproteínas Nucleares Heterogêneas/deficiência , Ribonucleoproteínas Nucleares Heterogêneas/genética , Ribonucleoproteínas Nucleares Heterogêneas/metabolismo , Fatores de Transcrição MEF2/genética , Camundongos , Proteína de Ligação a Regiões Ricas em Polipirimidinas/deficiência , Proteína de Ligação a Regiões Ricas em Polipirimidinas/genética , Proteína de Ligação a Regiões Ricas em Polipirimidinas/metabolismo , Splicing de RNA/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Proteínas com Domínio T/genética
4.
Genome Biol ; 18(1): 138, 2017 07 24.
Artigo em Inglês | MEDLINE | ID: mdl-28738873

RESUMO

Single cell experimental techniques reveal transcriptomic and epigenetic heterogeneity among cells, but how these are related is unclear. We present MATCHER, an approach for integrating multiple types of single cell measurements. MATCHER uses manifold alignment to infer single cell multi-omic profiles from transcriptomic and epigenetic measurements performed on different cells of the same type. Using scM&T-seq and sc-GEM data, we confirm that MATCHER accurately predicts true single cell correlations between DNA methylation and gene expression without using known cell correspondences. MATCHER also reveals new insights into the dynamic interplay between the transcriptome and epigenome in single embryonic stem cells and induced pluripotent stem cells.


Assuntos
Algoritmos , Epigênese Genética , Histonas/genética , Células-Tronco Pluripotentes Induzidas/metabolismo , Células-Tronco Embrionárias Murinas/metabolismo , Análise de Célula Única/métodos , Transcriptoma , Animais , Metilação de DNA , Genoma Humano , Histonas/metabolismo , Humanos , Células-Tronco Pluripotentes Induzidas/citologia , Camundongos , Células-Tronco Embrionárias Murinas/citologia , Análise de Sequência de RNA
5.
Nucleic Acids Res ; 44(17): 8292-301, 2016 09 30.
Artigo em Inglês | MEDLINE | ID: mdl-27530426

RESUMO

Genomic methods are used increasingly to interrogate the individual cells that compose specific tissues. However, current methods for single cell isolation struggle to phenotypically differentiate specific cells in a heterogeneous population and rely primarily on the use of fluorescent markers. Many cellular phenotypes of interest are too complex to be measured by this approach, making it difficult to connect genotype and phenotype at the level of individual cells. Here we demonstrate that microraft arrays, which are arrays containing thousands of individual cell culture sites, can be used to select single cells based on a variety of phenotypes, such as cell surface markers, cell proliferation and drug response. We then show that a common genomic procedure, RNA-seq, can be readily adapted to the single cells isolated from these rafts. We show that data generated using microrafts and our modified RNA-seq protocol compared favorably with the Fluidigm C1. We then used microraft arrays to select pancreatic cancer cells that proliferate in spite of cytotoxic drug treatment. Our single cell RNA-seq data identified several expected and novel gene expression changes associated with early drug resistance.


Assuntos
Separação Celular/métodos , Genômica/métodos , Análise em Microsséries , Animais , Linhagem Celular Tumoral , Proliferação de Células/efeitos dos fármacos , Células Cultivadas , Desoxicitidina/análogos & derivados , Desoxicitidina/farmacologia , Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos , Humanos , Camundongos , Reprodutibilidade dos Testes , Análise de Sequência de RNA , Ensaio Tumoral de Célula-Tronco , Gencitabina
6.
Genome Biol ; 17(1): 106, 2016 05 23.
Artigo em Inglês | MEDLINE | ID: mdl-27215581

RESUMO

Single cell experiments provide an unprecedented opportunity to reconstruct a sequence of changes in a biological process from individual "snapshots" of cells. However, nonlinear gene expression changes, genes unrelated to the process, and the possibility of branching trajectories make this a challenging problem. We develop SLICER (Selective Locally Linear Inference of Cellular Expression Relationships) to address these challenges. SLICER can infer highly nonlinear trajectories, select genes without prior knowledge of the process, and automatically determine the location and number of branches and loops. SLICER recovers the ordering of points along simulated trajectories more accurately than existing methods. We demonstrate the effectiveness of SLICER on previously published data from mouse lung cells and neural stem cells.


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de RNA/métodos , Análise de Célula Única , Animais , Redes Reguladoras de Genes/genética , Pulmão/citologia , Pulmão/metabolismo , Camundongos , Células-Tronco Neurais/citologia , Células-Tronco Neurais/metabolismo , RNA/genética , Software
7.
Nucleic Acids Res ; 44(8): e73, 2016 05 05.
Artigo em Inglês | MEDLINE | ID: mdl-26740580

RESUMO

Single cell RNA-seq experiments provide valuable insight into cellular heterogeneity but suffer from low coverage, 3' bias and technical noise. These unique properties of single cell RNA-seq data make study of alternative splicing difficult, and thus most single cell studies have restricted analysis of transcriptome variation to the gene level. To address these limitations, we developed SingleSplice, which uses a statistical model to detect genes whose isoform usage shows biological variation significantly exceeding technical noise in a population of single cells. Importantly, SingleSplice is tailored to the unique demands of single cell analysis, detecting isoform usage differences without attempting to infer expression levels for full-length transcripts. Using data from spike-in transcripts, we found that our approach detects variation in isoform usage among single cells with high sensitivity and specificity. We also applied SingleSplice to data from mouse embryonic stem cells and discovered a set of genes that show significant biological variation in isoform usage across the set of cells. A subset of these isoform differences are linked to cell cycle stage, suggesting a novel connection between alternative splicing and the cell cycle.


Assuntos
Processamento Alternativo/genética , Ciclo Celular/genética , Biologia Computacional/métodos , Células-Tronco Embrionárias/citologia , Isoformas de Proteínas/genética , Análise de Sequência de RNA/métodos , Animais , Sequência de Bases , Perfilação da Expressão Gênica/métodos , Camundongos , Modelos Estatísticos , RNA/genética
8.
RNA ; 21(7): 1375-89, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-26015596

RESUMO

Existing methods for detecting RNA intermediates resulting from exonuclease degradation are low-throughput and laborious. In addition, mapping the 3' ends of RNA molecules to the genome after high-throughput sequencing is challenging, particularly if the 3' ends contain post-transcriptional modifications. To address these problems, we developed EnD-Seq, a high-throughput sequencing protocol that preserves the 3' end of RNA molecules, and AppEnD, a computational method for analyzing high-throughput sequencing data. Together these allow determination of the 3' ends of RNA molecules, including nontemplated additions. Applying EnD-Seq and AppEnD to histone mRNAs revealed that a significant fraction of cytoplasmic histone mRNAs end in one or two uridines, which have replaced the 1-2 nt at the 3' end of mature histone mRNA maintaining the length of the histone transcripts. Histone mRNAs in fly embryos and ovaries show the same pattern, but with different tail nucleotide compositions. We increase the sensitivity of EnD-Seq by using cDNA priming to specifically enrich low-abundance tails of known sequence composition allowing identification of degradation intermediates. In addition, we show the broad applicability of our computational approach by using AppEnD to gain insight into 3' additions from diverse types of sequencing data, including data from small capped RNA sequencing and some alternative polyadenylation protocols.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Animais , Sequência de Bases , Células Cultivadas , Primers do DNA , DNA Complementar/genética , Drosophila , Histonas/genética , Humanos , Poliadenilação , RNA Mensageiro/genética , Reação em Cadeia da Polimerase Via Transcriptase Reversa
9.
BMC Genomics ; 16: 113, 2015 Feb 22.
Artigo em Inglês | MEDLINE | ID: mdl-25765044

RESUMO

BACKGROUND: Recent studies have shown that some pseudogenes are transcribed and contribute to cancer when dysregulated. In particular, pseudogene transcripts can function as competing endogenous RNAs (ceRNAs). The high similarity of gene and pseudogene nucleotide sequence has hindered experimental investigation of these mechanisms using RNA-seq. Furthermore, previous studies of pseudogenes in breast cancer have not integrated miRNA expression data in order to perform large-scale analysis of ceRNA potential. Thus, knowledge of both pseudogene ceRNA function and the role of pseudogene expression in cancer are restricted to isolated examples. RESULTS: To investigate whether transcribed pseudogenes play a pervasive regulatory role in cancer, we developed a novel bioinformatic method for measuring pseudogene transcription from RNA-seq data. We applied this method to 819 breast cancer samples from The Cancer Genome Atlas (TCGA) project. We then clustered the samples using pseudogene expression levels and integrated sample-paired pseudogene, gene and miRNA expression data with miRNA target prediction to determine whether more pseudogenes have ceRNA potential than expected by chance. CONCLUSIONS: Our analysis identifies with high confidence a set of 440 pseudogenes that are transcribed in breast cancer tissue. Of this set, 309 pseudogenes exhibit significant differential expression among breast cancer subtypes. Hierarchical clustering using only pseudogene expression levels accurately separates tumor samples from normal samples and discriminates the Basal subtype from the Luminal and Her2 subtypes. Correlation analysis shows more positively correlated pseudogene-parent gene pairs and negatively correlated pseudogene-miRNA pairs than expected by chance. Furthermore, 177 transcribed pseudogenes possess binding sites for co-expressed miRNAs that are also predicted to target their parent genes. Taken together, these results increase the catalog of putative pseudogene ceRNAs and suggest that pseudogene transcription in breast cancer may play a larger role than previously appreciated.


Assuntos
Neoplasias da Mama/genética , Pseudogenes/genética , RNA/genética , Transcrição Gênica , Neoplasias da Mama/classificação , Neoplasias da Mama/patologia , Biologia Computacional , Feminino , Regulação Neoplásica da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Invasividade Neoplásica/genética
10.
IEEE Int Conf Robot Autom ; 2014: 5804-5810, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-25419474

RESUMO

We present CARRT* (Cache-Aware Rapidly Exploring Random Tree*), an asymptotically optimal sampling-based motion planner that significantly reduces motion planning computation time by effectively utilizing the cache memory hierarchy of modern central processing units (CPUs). CARRT* can account for the CPU's cache size in a manner that keeps its working dataset in the cache. The motion planner progressively subdivides the robot's configuration space into smaller regions as the number of configuration samples rises. By focusing configuration exploration in a region for periods of time, nearest neighbor searching is accelerated since the working dataset is small enough to fit in the cache. CARRT* also rewires the motion planning graph in a manner that complements the cache-aware subdivision strategy to more quickly refine the motion planning graph toward optimality. We demonstrate the performance benefit of our cache-aware motion planning approach for scenarios involving a point robot as well as the Rethink Robotics Baxter robot.

11.
Mol Cell ; 53(6): 1020-30, 2014 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-24656133

RESUMO

Histone mRNAs are rapidly degraded when DNA replication is inhibited during S phase with degradation initiating with oligouridylation of the stem loop at the 3' end. We developed a customized RNA sequencing strategy to identify the 3' termini of degradation intermediates of histone mRNAs. Using this strategy, we identified two types of oligouridylated degradation intermediates: RNAs ending at different sites of the 3' side of the stem loop that resulted from initial degradation by 3'hExo and intermediates near the stop codon and within the coding region. Sequencing of polyribosomal histone mRNAs revealed that degradation initiates and proceeds 3' to 5' on translating mRNA and that many intermediates are capped. Knockdown of the exosome-associated exonuclease PM/Scl-100, but not the Dis3L2 exonuclease, slows histone mRNA degradation consistent with 3' to 5' degradation by the exosome containing PM/Scl-100. Knockdown of No-go decay factors also slowed histone mRNA degradation, suggesting a role in removing ribosomes from partially degraded mRNAs.


Assuntos
Regiões 3' não Traduzidas , Histonas/genética , Polirribossomos/genética , Estabilidade de RNA , Uridina/metabolismo , Sequência de Bases , Códon , Exorribonucleases/genética , Exorribonucleases/metabolismo , Complexo Multienzimático de Ribonucleases do Exossomo/genética , Complexo Multienzimático de Ribonucleases do Exossomo/metabolismo , Regulação da Expressão Gênica no Desenvolvimento , Biblioteca Gênica , Células HeLa , Histonas/metabolismo , Humanos , Células Jurkat , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Fases de Leitura Aberta , Polirribossomos/metabolismo , Fase S/genética , Análise de Sequência de RNA , Transdução de Sinais
12.
Genome Res ; 24(2): 241-50, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24158655

RESUMO

Comprehensive sequencing of human cancers has identified recurrent mutations in genes encoding chromatin regulatory proteins. For clear cell renal cell carcinoma (ccRCC), three of the five commonly mutated genes encode the chromatin regulators PBRM1, SETD2, and BAP1. How these mutations alter the chromatin landscape and transcriptional program in ccRCC or other cancers is not understood. Here, we identified alterations in chromatin organization and transcript profiles associated with mutations in chromatin regulators in a large cohort of primary human kidney tumors. By associating variation in chromatin organization with mutations in SETD2, which encodes the enzyme responsible for H3K36 trimethylation, we found that changes in chromatin accessibility occurred primarily within actively transcribed genes. This increase in chromatin accessibility was linked with widespread alterations in RNA processing, including intron retention and aberrant splicing, affecting ∼25% of all expressed genes. Furthermore, decreased nucleosome occupancy proximal to misspliced exons was observed in tumors lacking H3K36me3. These results directly link mutations in SETD2 to chromatin accessibility changes and RNA processing defects in cancer. Detecting the functional consequences of specific mutations in chromatin regulatory proteins in primary human samples could ultimately inform the therapeutic application of an emerging class of chromatin-targeted compounds.


Assuntos
Carcinoma de Células Renais/genética , Cromatina/genética , Histona-Lisina N-Metiltransferase/genética , Neoplasias Renais/genética , Carcinoma de Células Renais/patologia , Proteínas de Ligação a DNA , Regulação Neoplásica da Expressão Gênica , Histona-Lisina N-Metiltransferase/metabolismo , Humanos , Neoplasias Renais/patologia , Mutação , Proteínas Nucleares/genética , Processamento Pós-Transcricional do RNA/genética , Splicing de RNA/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Proteínas Supressoras de Tumor/genética , Ubiquitina Tiolesterase/genética
13.
Nucleic Acids Res ; 41(19): e178, 2013 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-23935067

RESUMO

Identifying variants using high-throughput sequencing data is currently a challenge because true biological variants can be indistinguishable from technical artifacts. One source of technical artifact results from incorrectly aligning experimentally observed sequences to their true genomic origin ('mismapping') and inferring differences in mismapped sequences to be true variants. We developed BlackOPs, an open-source tool that simulates experimental RNA-seq and DNA whole exome sequences derived from the reference genome, aligns these sequences by custom parameters, detects variants and outputs a blacklist of positions and alleles caused by mismapping. Blacklists contain thousands of artifact variants that are indistinguishable from true variants and, for a given sample, are expected to be almost completely false positives. We show that these blacklist positions are specific to the alignment algorithm and read length used, and BlackOPs allows users to generate a blacklist specific to their experimental setup. We queried the dbSNP and COSMIC variant databases and found numerous variants indistinguishable from mapping errors. We demonstrate how filtering against blacklist positions reduces the number of potential false variants using an RNA-seq glioblastoma cell line data set. In summary, accounting for mapping-caused variants tuned to experimental setups reduces false positives and, therefore, improves genome characterization by high-throughput sequencing.


Assuntos
Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Alinhamento de Sequência/métodos , Software , Artefatos , Linhagem Celular Tumoral , Mapeamento Cromossômico , Bases de Dados de Ácidos Nucleicos , Exoma , Humanos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos , Análise de Sequência de RNA/métodos
14.
J Comput Biol ; 20(3): 167-87, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23461570

RESUMO

The advent of high throughput RNA-seq technology allows deep sampling of the transcriptome, making it possible to characterize both the diversity and the abundance of transcript isoforms. Accurate abundance estimation or transcript quantification of isoforms is critical for downstream differential analysis (e.g., healthy vs. diseased cells) but remains a challenging problem for several reasons. First, while various types of algorithms have been developed for abundance estimation, short reads often do not uniquely identify the transcript isoforms from which they were sampled. As a result, the quantification problem may not be identifiable, i.e., lacks a unique transcript solution even if the read maps uniquely to the reference genome. In this article, we develop a general linear model for transcript quantification that leverages reads spanning multiple splice junctions to ameliorate identifiability. Second, RNA-seq reads sampled from the transcriptome exhibit unknown position-specific and sequence-specific biases. We extend our method to simultaneously learn bias parameters during transcript quantification to improve accuracy. Third, transcript quantification is often provided with a candidate set of isoforms, not all of which are likely to be significantly expressed in a given tissue type or condition. By resolving the linear system with LASSO, our approach can infer an accurate set of dominantly expressed transcripts while existing methods tend to assign positive expression to every candidate isoform. Using simulated RNA-seq datasets, our method demonstrated better quantification accuracy and the inference of dominant set of transcripts than existing methods. The application of our method on real data experimentally demonstrated that transcript quantification is effective for differential analysis of transcriptomes.


Assuntos
Biologia Computacional/métodos , RNA Mensageiro/genética , Análise de Sequência de RNA/métodos , Estatística como Assunto , Algoritmos , Simulação por Computador , Humanos , Modelos Lineares , Células MCF-7 , RNA Mensageiro/metabolismo , Transcriptoma/genética
15.
Nucleic Acids Res ; 41(2): e39, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23155066

RESUMO

The RNA transcriptome varies in response to cellular differentiation as well as environmental factors, and can be characterized by the diversity and abundance of transcript isoforms. Differential transcription analysis, the detection of differences between the transcriptomes of different cells, may improve understanding of cell differentiation and development and enable the identification of biomarkers that classify disease types. The availability of high-throughput short-read RNA sequencing technologies provides in-depth sampling of the transcriptome, making it possible to accurately detect the differences between transcriptomes. In this article, we present a new method for the detection and visualization of differential transcription. Our approach does not depend on transcript or gene annotations. It also circumvents the need for full transcript inference and quantification, which is a challenging problem because of short read lengths, as well as various sampling biases. Instead, our method takes a divide-and-conquer approach to localize the difference between transcriptomes in the form of alternative splicing modules (ASMs), where transcript isoforms diverge. Our approach starts with the identification of ASMs from the splice graph, constructed directly from the exons and introns predicted from RNA-seq read alignments. The abundance of alternative splicing isoforms residing in each ASM is estimated for each sample and is compared across sample groups. A non-parametric statistical test is applied to each ASM to detect significant differential transcription with a controlled false discovery rate. The sensitivity and specificity of the method have been assessed using simulated data sets and compared with other state-of-the-art approaches. Experimental validation using qRT-PCR confirmed a selected set of genes that are differentially expressed in a lung differentiation study and a breast cancer data set, demonstrating the utility of the approach applied on experimental biological data sets. The software of DiffSplice is available at http://www.netlab.uky.edu/p/bioinfo/DiffSplice.


Assuntos
Processamento Alternativo , Perfilação da Expressão Gênica , Análise de Sequência de RNA , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Diferenciação Celular , Feminino , Genoma Humano , Humanos , Pulmão/citologia , Pulmão/metabolismo , Software , Transcriptoma
16.
Bioinformatics ; 27(19): 2633-40, 2011 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-21824971

RESUMO

MOTIVATION: In eukaryotic cells, alternative splicing expands the diversity of RNA transcripts and plays an important role in tissue-specific differentiation, and can be misregulated in disease. To understand these processes, there is a great need for methods to detect differential transcription between samples. Our focus is on samples observed using short-read RNA sequencing (RNA-seq). METHODS: We characterize differential transcription between two samples as the difference in the relative abundance of the transcript isoforms present in the samples. The magnitude of differential transcription of a gene between two samples can be measured by the square root of the Jensen Shannon Divergence (JSD*) between the gene's transcript abundance vectors in each sample. We define a weighted splice-graph representation of RNA-seq data, summarizing in compact form the alignment of RNA-seq reads to a reference genome. The flow difference metric (FDM) identifies regions of differential RNA transcript expression between pairs of splice graphs, without need for an underlying gene model or catalog of transcripts. We present a novel non-parametric statistical test between splice graphs to assess the significance of differential transcription, and extend it to group-wise comparison incorporating sample replicates. RESULTS: Using simulated RNA-seq data consisting of four technical replicates of two samples with varying transcription between genes, we show that (i) the FDM is highly correlated with JSD* (r=0.82) when average RNA-seq coverage of the transcripts is sufficiently deep; and (ii) the FDM is able to identify 90% of genes with differential transcription when JSD* >0.28 and coverage >7. This represents higher sensitivity than Cufflinks (without annotations) and rDiff (MMD), which respectively identified 69 and 49% of the genes in this region as differential transcribed. Using annotations identifying the transcripts, Cufflinks was able to identify 86% of the genes in this region as differentially transcribed. Using experimental data consisting of four replicates each for two cancer cell lines (MCF7 and SUM102), FDM identified 1425 genes as significantly different in transcription. Subsequent study of the samples using quantitative real time polymerase chain reaction (qRT-PCR) of several differential transcription sites identified by FDM, confirmed significant differences at these sites. AVAILABILITY: http://csbio-linux001.cs.unc.edu/nextgen/software/FDM CONTACT: darshan@email.unc.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Processamento Alternativo , RNA/genética , Análise de Sequência de RNA/métodos , Transcriptoma/genética , Perfilação da Expressão Gênica/métodos , Genoma , Humanos , Modelos Genéticos , Isoformas de Proteínas/genética , Transcrição Gênica
17.
Nucleic Acids Res ; 38(18): e178, 2010 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-20802226

RESUMO

The accurate mapping of reads that span splice junctions is a critical component of all analytic techniques that work with RNA-seq data. We introduce a second generation splice detection algorithm, MapSplice, whose focus is high sensitivity and specificity in the detection of splices as well as CPU and memory efficiency. MapSplice can be applied to both short (<75 bp) and long reads (≥ 75 bp). MapSplice is not dependent on splice site features or intron length, consequently it can detect novel canonical as well as non-canonical splices. MapSplice leverages the quality and diversity of read alignments of a given splice to increase accuracy. We demonstrate that MapSplice achieves higher sensitivity and specificity than TopHat and SpliceMap on a set of simulated RNA-seq data. Experimental studies also support the accuracy of the algorithm. Splice junctions derived from eight breast cancer RNA-seq datasets recapitulated the extensiveness of alternative splicing on a global level as well as the differences between molecular subtypes of breast cancer. These combined results indicate that MapSplice is a highly accurate algorithm for the alignment of RNA-seq reads to splice junctions. Software download URL: http://www.netlab.uky.edu/p/bioinfo/MapSplice.


Assuntos
Algoritmos , Processamento Alternativo , Sítios de Splice de RNA , Análise de Sequência de RNA , Software , Neoplasias da Mama/genética , Feminino , Perfilação da Expressão Gênica , Humanos
18.
Bioinformatics ; 26(16): 1950-7, 2010 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-20576625

RESUMO

MOTIVATION: The RNA-seq paired-end read (PER) protocol samples transcript fragments longer than the sequencing capability of today's technology by sequencing just the two ends of each fragment. Deep sampling of the transcriptome using the PER protocol presents the opportunity to reconstruct the unsequenced portion of each transcript fragment using end reads from overlapping PERs, guided by the expected length of the fragment. METHODS: A probabilistic framework is described to predict the alignment to the genome of all PER transcript fragments in a PER dataset. Starting from possible exonic and spliced alignments of all end reads, our method constructs potential splicing paths connecting paired ends. An expectation maximization method assigns likelihood values to all splice junctions and assigns the most probable alignment for each transcript fragment. RESULTS: The method was applied to 2 x 35 bp PER datasets from cancer cell lines MCF-7 and SUM-102. PER fragment alignment increased the coverage 3-fold compared to the alignment of the end reads alone, and increased the accuracy of splice detection. The accuracy of the expectation maximization (EM) algorithm in the presence of alternative paths in the splice graph was validated by qRT-PCR experiments on eight exon skipping alternative splicing events. PER fragment alignment with long-range splicing confirmed 8 out of 10 fusion events identified in the MCF-7 cell line in an earlier study by (Maher et al., 2009). AVAILABILITY: Software available at http://www.netlab.uky.edu/p/bioinfo/MapSplice/PER.


Assuntos
RNA Mensageiro/química , Alinhamento de Sequência , Análise de Sequência de RNA/métodos , Algoritmos , Processamento Alternativo , Sequência de Bases , Éxons , Perfilação da Expressão Gênica , Genoma , Humanos , Probabilidade , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...